機器學習4-卷積神經網路

林嶔 (Lin, Chin)

Lesson 22

卷積神經網路介紹(1)

– 但回到我們的手寫數字分類問題,當我們看到這些手寫數字時,我們一眼就能認出他們了,但從「圖片」到「概念」的過程真的這麼簡單嗎?

F22_1

卷積神經網路介紹(2)

F22_2

– 他們的研究發現,貓咪在受到不同形狀的圖像刺激時,感受野的腦部細胞會產生不同反應

F22_3

卷積神經網路介紹(3)

– 卷積器模擬了感受野最初的細胞,他們負責用來辨認特定特徵,他們的數學模式如下:

F22_4

– 「特徵圖」的意義是什麼呢?卷積器就像是最初級的視覺細胞,他們專門辨認某一種簡單特徵,那這個「特徵圖」上面數字越大的,就代表那個地方越符合該細胞所負責的特徵。

F22_5

卷積神經網路介紹(4)

F22_6

F22_7

  1. 原始圖片(28x28x1)要先經過20個5x5的「卷積器」(5x5x1x20)處理,將使圖片變成20張「特徵圖」(24x24x20)

  2. 接著這20張「特徵圖」(24x24x20)會經過非線性轉換,產生20張「轉換後的特徵圖」(24x24x20)

  3. 接著這20張「轉換後的特徵圖」(24x24x20)再經過2x2「池化器」(2x2)處理,將使圖片變成20張「降維後的特徵圖」(12x12x20)

卷積神經網路介紹(5)

– 我們想像有一張人的圖片,假定第一個卷積器是辨認眼睛的特徵,第二個卷積器是在辨認鼻子的特徵,第三個卷積器是在辨認耳朵的特徵,第四個卷積器是在辨認手掌的特徵,第五個卷積器是在辨認手臂的特徵

– 第1.2.3張特徵圖中數值越高的地方,就分別代表眼睛、鼻子、耳朵最有可能在的位置,那將這3張特徵圖合在一起看再一次卷積,是否就能辨認出人臉的位置?

– 第4.5張特徵圖中數值越高的地方,就分別代表手掌、手臂最有可能在的位置,那將這2張特徵圖合在一起看再一次卷積,是否就能辨認出的位置?

– 第4.5張特徵圖對人臉辨識同樣能起到作用,因為人臉不包含手掌、手臂,因此如果有個卷積器想要辨認人臉,他必須對第1.2.3張特徵圖做正向加權,而對第4.5張特徵圖做負向加權

F22_8

練習-1

– 這是一張鸚鵡的圖片

library(imager)

img <- load.image(system.file("extdata/parrots.png", package="imager"))
gary.img <- grayscale(img)
plot(gary.img)

– 我們試著用一個特殊結構的卷積器取得他的特徵圖吧!

conv.filter.1 = matrix(c(-1, -1, -1,
                         -1, +8, -1,
                         -1, -1, -1), nrow = 3)

img.matrix = as.matrix(gary.img)

feature.img = matrix(NA, nrow = nrow(img.matrix) - 2, ncol = ncol(img.matrix) - 2)

for (i in 1:nrow(feature.img)) {
  for (j in 1:ncol(feature.img)) {
    sub.img.matrix = img.matrix[0:2+i,0:2+j]
    feature.img[i,j] = sum(sub.img.matrix * conv.filter.1)
  }
}

new.img = as.cimg(feature.img)
plot(new.img)

– 請你試著使用其他卷積器來試試看吧?

conv.filter.2 = matrix(c(-1, 0, +1,
                         -2, 0, +2,
                         -1, 0, +1), nrow = 3)

利用卷積神經網路做手寫數字辨識(1)

F22_9

利用卷積神經網路做手寫數字辨識(2)

– 再次複習一下他的資料結構

DAT = read.csv("data/train.csv")

#Split data

set.seed(0)
Train.sample = sample(1:nrow(DAT), nrow(DAT)*0.6, replace = FALSE)

Train.X = DAT[Train.sample,-1]/255
Train.Y = DAT[Train.sample,1]
Test.X = DAT[-Train.sample,-1]/255
Test.Y = DAT[-Train.sample,1]

#Display

library(imager)

par(mar=rep(0,4), mfcol = c(4, 4))
for (i in 1:16) {
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  img = as.raster(t(matrix(as.numeric(Train.X[i,]), nrow = 28)))
  rasterImage(img, -0.04, -0.04, 1.04, 1.04, interpolate=FALSE)
  text(0.05, 0.95, Train.Y[i], col = "green", cex = 2)
}

利用卷積神經網路做手寫數字辨識(3)

Train.X.array = t(Train.X)
dim(Train.X.array) <- c(28, 28, 1, nrow(Train.X))
Test.X.array <- t(Test.X)
dim(Test.X.array) <- c(28, 28, 1, nrow(Test.X))
library(mxnet)

# input
data <- mx.symbol.Variable('data')
# first conv
conv1 <- mx.symbol.Convolution(data=data, kernel=c(5,5), num_filter=20)
relu1 <- mx.symbol.Activation(data=conv1, act_type="relu")
pool1 <- mx.symbol.Pooling(data=relu1, pool_type="max",
                          kernel=c(2,2), stride=c(2,2))
# second conv
conv2 <- mx.symbol.Convolution(data=pool1, kernel=c(5,5), num_filter=50)
relu2 <- mx.symbol.Activation(data=conv2, act_type="relu")
pool2 <- mx.symbol.Pooling(data=relu2, pool_type="max",
                          kernel=c(2,2), stride=c(2,2))
# first fullc
flatten <- mx.symbol.Flatten(data=pool2)
fc1 <- mx.symbol.FullyConnected(data=flatten, num_hidden=500)
relu3 <- mx.symbol.Activation(data=fc1, act_type="relu")

# second fullc
fc2 <- mx.symbol.FullyConnected(data=relu3, num_hidden=10)
# loss
lenet <- mx.symbol.SoftmaxOutput(data=fc2)

– 第一層卷積組合

  1. 原始圖片(28x28x1)要先經過20個5x5的「卷積器」(5x5x1x20)處理,將使圖片變成20張「一階特徵圖」(24x24x20)

  2. 接著這20張「一階特徵圖」(24x24x20)會經過ReLU,產生20張「轉換後的一階特徵圖」(24x24x20)

  3. 接著這20張「轉換後的一階特徵圖」(24x24x20)再經過2x2「池化器」(2x2)處理,將使圖片變成20張「降維後的一階特徵圖」(12x12x20)

– 第二層卷積組合

  1. 再將20張「降維後的一階特徵圖」(12x12x20)經過50個5x5的「卷積器」(5x5x20x50)處理,將使圖片變成50張「二階特徵圖」(8x8x50)

  2. 接著這50張「二階特徵圖」(8x8x50)會經過ReLU,產生50張「轉換後的二階特徵圖」(8x8x50)

  3. 接著這50張「轉換後的二階特徵圖」(8x8x50)再經過2x2「池化器」(2x2)處理,將使圖片變成50張「降維後的二階特徵圖」(4x4x50)

– 全連接層

  1. 將「降維後的二階特徵圖」(4x4x50)重新排列,壓製成「一階高級特徵」(800)

  2. 讓「一階高級特徵」(800)進入「隱藏層」,輸出「二階高級特徵」(500)

  3. 「二階高級特徵」(500)經過ReLU,輸出「轉換後的二階高級特徵」(500)

  4. 「轉換後的二階高級特徵」(500)進入「輸出層」,產生「原始輸出」(10)

  5. 「原始輸出」(10)經過Softmax函數轉換,判斷圖片是哪個類別

利用卷積神經網路做手寫數字辨識(4)

mx.set.seed(0)
model_1 = mx.model.FeedForward.create(lenet, X = Train.X.array, y = Train.Y,
                                      ctx = mx.cpu(), num.round = 20, array.batch.size = 100,
                                      learning.rate = 0.05, momentum = 0.9, wd = 0.00001,
                                      eval.metric = mx.metric.accuracy,
                                      epoch.end.callback = mx.callback.log.train.metric(100))
preds = predict(model_1, Test.X.array)
pred.label = max.col(t(preds)) - 1
tab = table(pred.label, Test.Y)
cat("Testing accuracy rate =", sum(diag(tab))/sum(tab))
## Testing accuracy rate = 0.9863095
print(tab)
##           Test.Y
## pred.label    0    1    2    3    4    5    6    7    8    9
##          0 1650    0    1    3    1    2    1    0    2    6
##          1    0 1839    3    0    3    0    2    1    4    1
##          2    1    2 1639    8    0    0    1   24    7    0
##          3    0    0    3 1713    0    4    1    1    1    6
##          4    0    0    1    0 1591    0    2    3    2   21
##          5    0    2    0    8    0 1530    4    0    3    5
##          6    9    1    1    0    1    6 1648    0    3    0
##          7    0    4    4    0    3    1    0 1717    0    7
##          8    1    3    3    9    1    5    2    4 1652    5
##          9    2    0    1    1    6    3    0    3    1 1591

– 如果你的電腦核心夠多,可以用下面的指令

n.cpu <- 4
device.cpu <- lapply(0:(n.cpu-1), function(i) {mx.cpu(i)})

model_2 = mx.model.FeedForward.create(lenet, X = Train.X.array, y = Train.Y,
                                      ctx = device.cpu, num.round = 1, array.batch.size = 100,
                                      learning.rate = 0.05, momentum = 0.9, wd = 0.00001,
                                      eval.metric = mx.metric.accuracy,
                                      arg.params = model_1$arg.params,
                                      epoch.end.callback = mx.callback.log.train.metric(100))

練習-2

PARAMS = model_1$arg.params
ls(PARAMS)
## [1] "convolution0_bias"      "convolution0_weight"   
## [3] "convolution1_bias"      "convolution1_weight"   
## [5] "fullyconnected0_bias"   "fullyconnected0_weight"
## [7] "fullyconnected1_bias"   "fullyconnected1_weight"
  1. 原始圖片(28x28x1)要先經過20個5x5的「卷積器」(5x5x1x20)處理,將使圖片變成20張「一階特徵圖」(24x24x20)

  2. 接著這20張「一階特徵圖」(24x24x20)會經過ReLU,產生20張「轉換後的一階特徵圖」(24x24x20)

  3. 接著這20張「轉換後的一階特徵圖」(24x24x20)再經過2x2「池化器」(2x2)處理,將使圖片變成20張「降維後的一階特徵圖」(12x12x20)

  4. 再將20張「降維後的一階特徵圖」(12x12x20)經過50個5x5的「卷積器」(5x5x20x20)處理,將使圖片變成50張「二階特徵圖」(8x8x50)

  5. 接著這50張「二階特徵圖」(8x8x50)會經過ReLU,產生50張「轉換後的二階特徵圖」(8x8x50)

  6. 接著這50張「轉換後的二階特徵圖」(8x8x50)再經過2x2「池化器」(2x2)處理,將使圖片變成50張「降維後的二階特徵圖」(4x4x50)

  7. 將「降維後的二階特徵圖」(4x4x50)重新排列,壓製成「一階高級特徵」(800)

  8. 讓「一階高級特徵」(800)進入「隱藏層」,輸出「二階高級特徵」(500)

  9. 「二階高級特徵」(500)經過ReLU,輸出「轉換後的二階高級特徵」(500)

  10. 「轉換後的二階高級特徵」(500)進入「輸出層」,產生「原始輸出」(10)

  11. 「原始輸出」(10)經過Softmax函數轉換,判斷圖片是哪個類別

Input = Test.X.array[,,,1]
dim(Input) = c(28, 28, 1, 1)
preds = predict(model_1, Input)
pred.label = max.col(t(preds)) - 1

par(mar=rep(0,4))
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
img = as.raster(t(matrix(as.numeric(Input), nrow = 28)))
rasterImage(img, -0.04, -0.04, 1.04, 1.04, interpolate=FALSE)
text(0.05, 0.95, Test.Y[1], col = "green", cex = 2)
text(0.95, 0.95, pred.label, col = "blue", cex = 2)

利用卷積神經網路來做出院病歷摘要編碼(1)

– 在這裡我們介紹一個有趣的例子,利用卷積神經網路來做出院病歷摘要編碼,其效竟超過了SVM、Random forest等方法

– 我們能透過「word embedding」將文字轉為向量,他的主要目標是讓字義相似的字在向量空間中非常接近,我們能利用「word2vec」做到這點

利用卷積神經網路來做出院病歷摘要編碼(2)

– 這個檔案很大,我們可以利用套件「data.table」協助我們快速的讀取這樣的檔案

library(data.table)
library(magrittr)
library(dplyr)
library(plyr)

word.data = fread("data/glove.6B.50d.txt", header = FALSE)
## 
Read 17.6% of 341218 rows
Read 29.3% of 341218 rows
Read 41.0% of 341218 rows
Read 52.8% of 341218 rows
Read 64.5% of 341218 rows
Read 76.2% of 341218 rows
Read 87.9% of 341218 rows
Read 99.6% of 341218 rows
Read 341218 rows and 51 (of 51) columns from 0.137 GB file in 00:00:12
words.ref = word.data %>% select(V1) %>% setDF %>% .[,1] %>% as.character
words.matrix = word.data %>% select(-V1) %>% setDF %>% as.matrix
rownames(words.matrix) = words.ref
dim(words.matrix)
## [1] 341218     50

– 接著,讓我們找看看最接近「adenocarcinoma」的字是哪些,使用餘弦值作為指標

word = "adenocarcinoma"

word.vector = words.matrix[which(words.ref==word),]
other.vectors = words.matrix[which(words.ref!=word),]

Dot_Product = other.vectors %*% word.vector
distance.word = sqrt(sum(word.vector^2))
distance.other = apply(other.vectors, 1, function(x) {sqrt(sum(x^2))})

cos_value = Dot_Product/distance.word/distance.other
cos_value = cos_value[order(cos_value[,1], decreasing = TRUE),]
head(cos_value, 10)
##      carcinoma       squamous     esophageal     carcinomas     metastatic 
##      0.8821906      0.7957962      0.7850613      0.7850401      0.7780124 
## hepatocellular nasopharyngeal       melanoma        ovarian        nodules 
##      0.7483364      0.7315709      0.7205641      0.7195940      0.7167293

利用卷積神經網路來做出院病歷摘要編碼(3)

example = "Adenocarcinoma of stomach with peritoneal carcinomatosis and massive ascite, stage IV under bidirection chemotherapy (neoadjuvant intraperitoneal-systemic chemotherapy) with intraperitoneal paclitaxel 120mg (20151126, 20151201) and systemic with Oxalip (20151127) and oral XELOX."

text = tolower(example)
text = gsub("\n", "@@@@@", text, fixed = TRUE)
text = gsub("\r", "@@@@@", text, fixed = TRUE)
text = gsub("[ :,;-]", "@", text)
text = gsub("(", "@", text, fixed = TRUE)
text = gsub(")", "@", text, fixed = TRUE)
text = gsub("/", "@", text, fixed = TRUE)
text = strsplit(text, split = ".", fixed = TRUE)[[1]]
text = paste(text, collapse = "@@@@@")
text = strsplit(text, split = "@", fixed = TRUE)[[1]]

TEXT.ARRAY = matrix(0, nrow = length(text), ncol = 50)
for (i in 1:length(text)) {
  if (text[i]!="") {
    pos = which(words.ref == text[i])
    if (length(pos)==1) {
      TEXT.ARRAY[i,] = words.matrix[pos,]
    }
  }
}

library(imager)
img = TEXT.ARRAY
img[img>2] = 2
img[img<-2] = -2
plot(as.cimg(t(img)))

利用卷積神經網路來做出院病歷摘要編碼(4)

load("data/ICD10.RData")

Train.X.array = ARRAY[,,1:3000]
dim(Train.X.array) = c(100, 50, 1, 3000)
Train.Y = LABEL[1:3000]

Vald.X.array = ARRAY[,,3001:4000]
dim(Vald.X.array) = c(100, 50, 1, 1000)
Vald.Y = LABEL[3001:4000]

Test.X.array = ARRAY[,,4001:5000]
dim(Test.X.array) = c(100, 50, 1, 1000)
Test.Y = LABEL[4001:5000]

F22_12

library(mxnet)

get_symbol_textcnn <- function(num_outcome = 1,
                               filter_sizes = 1:5,
                               num_filter = c(40, 30, 15, 10, 5),
                               Seq.length = 100,
                               word.dimation = 50,
                               dropout = 0.5) {
  
  data <- mx.symbol.Variable('data')
  
  concat_lst <- NULL
  
  for (i in 1:length(filter_sizes)) {
    convi <- mx.symbol.Convolution(data = data,
                                   kernel = c(filter_sizes[i], word.dimation),
                                   pad = c(filter_sizes[i]-1, 0),
                                   num_filter = num_filter[i],
                                   name = paste0('conv', i))
    relui <- mx.symbol.Activation(data = convi,
                                  act_type = "relu",
                                  name = paste0('relu', i))
    pooli <- mx.symbol.Pooling(data = relui,
                               pool_type = "max",
                               kernel = c(Seq.length + filter_sizes[i] - 1, 1),
                               stride = c(1, 1),
                               name = paste0('pool', i))
    concat_lst = append(concat_lst, pooli)
  }
  
  concat_lst$num.args = length(filter_sizes)
  
  h_pool = mxnet:::mx.varg.symbol.Concat(concat_lst)
  
  # dropout layer
  
  if (dropout > 0) {
    h_drop = mx.symbol.Dropout(data = h_pool, p = dropout)
  } else {
    h_drop = h_pool
  }
  
  # fully connected layer
  
  cls_weight = mx.symbol.Variable('cls_weight')
  cls_bias = mx.symbol.Variable('cls_bias')
  
  fc = mx.symbol.FullyConnected(data = h_drop,
                                weight = cls_weight,
                                bias = cls_bias,
                                num_hidden = num_outcome)
  lr = mx.symbol.LogisticRegressionOutput(fc, name='lr')
  
  return(lr)
}

my.eval.metric.CE <- mx.metric.custom(
  name = "Cross-Entropy (CE)", 
  function(real, pred) {
    real1 = as.numeric(real)
    pred1 = as.numeric(pred)
    pred1[pred1 <= 1e-6] = 1e-6
    pred1[pred1 >= 1 - 1e-6] = 1 - 1e-6
    return(-mean(real1 * log(pred1) + (1 - real1) * log(1 - pred1), na.rm = TRUE))
  }
)

mx.callback.early.stop <- function(period, logger = NULL, small.value = "good", tolerance = 1e-4) {
  function(iteration, nbatch, env, verbose) {
    if (nbatch %% period == 0 && !is.null(env$metric)) {
      result <- env$metric$get(env$train.metric)
      if (nbatch != 0) {
        if(verbose) {cat(paste0("Batch [", nbatch, "] Train-", result$name, "=", result$value, "\n"))}
      }
      if (!is.null(logger)) {
        if (class(logger) != "mx.metric.logger") {
          stop("Invalid mx.metric.logger.")
        } else {
          logger$train <- c(logger$train, result$value)
          if (!is.null(env$eval.metric)) {
            result <- env$metric$get(env$eval.metric)
            if (nbatch != 0) {cat(paste0("Batch [", nbatch, "] Validation-", result$name, "=", result$value, "\n"))}
            logger$eval <- c(logger$eval, result$value)
          }
        }
      }
    }
    if (!is.null(env$metric)) {
      if (length(logger$train) >= 10) {
        if (!is.null(env$eval.metric)) {TEST.VALUE = round(logger$eval/tolerance)} else {TEST.VALUE = round(logger$train/tolerance)}
        if (small.value=="good") {
          if (mean(tail(TEST.VALUE, 10)) <= mean(tail(TEST.VALUE, 5))) {return(FALSE)}
        } else {
          if (mean(tail(TEST.VALUE, 10)) >= mean(tail(TEST.VALUE, 5))) {return(FALSE)}
        }
      }
    }
    return(TRUE)
  }
}

利用卷積神經網路來做出院病歷摘要編碼(5)

mx.set.seed(0)

logger = mx.metric.logger$new()

cnn.model = mx.model.FeedForward.create(get_symbol_textcnn(),
                                        X = Train.X.array, y = Train.Y,
                                        eval.data = list(data = Vald.X.array, label = Vald.Y),
                                        ctx = mx.cpu(), num.round = 100,
                                        array.batch.size = 100, learning.rate = 0.05,
                                        momentum = 0.9, wd = 0.00001,
                                        eval.metric = my.eval.metric.CE,
                                        epoch.end.callback = mx.callback.early.stop(100, logger, small.value = "good"))
pred.prob = predict(cnn.model, Test.X.array)
pred.y = pred.prob>0.5
tab = table(Test.Y, pred.y)
cat("Testing accuracy rate =", sum(diag(tab))/sum(tab))
## Testing accuracy rate = 0.943
print(tab)
##       pred.y
## Test.Y FALSE TRUE
##      0   663   31
##      1    26  280

練習-3

– 但要注意的是,由於訓練「cnn.model」時所用的輸入是資料維度是(100, 50, 1, n),故你的字數就算沒有達到100也許要補空格到100喔!

1. Adenocarcinoma of stomach with peritoneal carcinomatosis and massive ascite, stage IV under bidirection chemotherapy (neoadjuvant intraperitoneal-systemic chemotherapy) with intraperitoneal paclitaxel 120mg (20151126, 20151201) and systemic with Oxalip (20151127) and oral XELOX.

2. Chronic kidney disease, stage V with pulmonary edema underwent emergent hemodialysis, status post arteriovenous graft creation with maintenance hemodialysis.

小結

– 相較於類神經網路,卷積神經網路成功的原因大多認為在卷積層的參數共享。

F22_10

– 卷積神經網路不只能用來識別影像,同樣也能用來識別文件、語音等,只要能將他們的資料結構化為2維抽象圖就能使用卷積神經網路。

F22_11